Licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC

Introduction

This tutorial aims to help you get started using pomp as a suite of tools for analysis of time series data based on stochastic dynamical systems models. First, we give some conceptual background regarding the class of models—partially observed Markov processes (POMPs)—that pomp handles. We then discuss some preliminaries: installing the package and so on. Next, we show how to simulate a POMP using pomp. We then analyze some data using a few different tools. In so doing, we illustrate some of the package’s capabilities by using its algorithms to fit, compare, and criticize the models using pomp’s algorithms. From time to time, exercises for the reader are given.

Partially observed Markov process (POMP) models

Structure of a POMP

As its name implies pomp is focused on partially observed Markov process models. These are also known as state-space models, hidden Markov models, and stochastic dynamical systems models. Such models consist of an unobserved Markov state process, connected to the data via an explicit model of the observation process. We refer to the former as the latent process model (or process model for short) and the latter as the measurement model.

Mathematically, each model is a probability distribution. Let \(Y_n\) denote the measurement process and \(X_n\) the latent state process, then by definition, the process model is determined by the density \(f_{X_n|X_{n-1}}\) and the initial density \(f_{X_0}\). The measurement process is determined by the density \(f_{Y_n|X_n}\). These two submodels determine the full POMP model, i.e., the joint distribution \(f_{X_{0:N},Y_{1:N}}\). If we have a sequence of measurements, \(y^*_{n}\), made at times \(t_n\), \(n=1,\dots,N\), then we think of these data, collectively, as a single realization of the \(Y\) process.


The following schematic shows shows causal relations among the process model, the measurement model, and the data. From the statistical point of view, the key perspective is that the model is, at least hypothetically, the process that generated the data.

Structure of a POMP. Arrows show the direction of causality. The closed loop from the state process to itself indicates the dynamic nature of this Markovian process. Information flow runs in the opposite direction.

Structure of a POMP. Arrows show the direction of causality. The closed loop from the state process to itself indicates the dynamic nature of this Markovian process. Information flow runs in the opposite direction.


Mathematically, a POMP is characterized by two conditions.

  1. The state process, \(X_n\), is Markovian, i.e., \[\mathrm{Prob}[X_n|X_0,\dots,X_{n-1},Y_1,\dots,Y_{n-1}]=\mathrm{Prob}[X_n|X_{n-1}].\]
  2. The measurements, \(Y_n\), depend only on the state at that time: \[\mathrm{Prob}[Y_n|X_0,\dots,X_{n},Y_1,\dots,Y_{n-1}]=\mathrm{Prob}[Y_n|X_{n}],\] for all \(n=1,\dots,N\).

These conditions can be represented schematically by the following diagram, which indicates the direct dependencies among model variables.


**Conditional independence graph of a POMP.**  The latent state $X_n$ at time $t_n$ is conditionally independent of its history given $X_{n-1}$.  The observation $Y_n$ is conditionally independent of all other variables given $X_n$.  The underlying time scale can be taken to be either discrete or continuous, and the observation times need not be equally spaced.

Conditional independence graph of a POMP. The latent state \(X_n\) at time \(t_n\) is conditionally independent of its history given \(X_{n-1}\). The observation \(Y_n\) is conditionally independent of all other variables given \(X_n\). The underlying time scale can be taken to be either discrete or continuous, and the observation times need not be equally spaced.


Basic model components

To implement a POMP model in pomp, we have to specify the measurement and process distributions. Note however, that, for each of the process and the measurement models there are two distinct operations that we might desire to perform:

  1. we might wish to simulate, i.e., to draw a (pseudo)random sample from the distribution, or
  2. we might wish to evaluate the density itself at given values of \(X_n\) and/or \(Y_n\).

Following the R convention, we refer to the simulation of \(f_{X_n|X_{n-1}}\) as the rprocess component of the POMP model and the evaluation of \(f_{X_n|X_{n-1}}\) as the dprocess component. Similarly, the simulation of \(f_{Y_n|X_n}\) is the rmeasure component while the evaluation of \(f_{Y_n|X_n}\) is the dmeasure component. Finally, we’ll call a simulator of \(f_{X_0}\) the rinit component. Collectively, we’ll refer to these, and other, similarly basic elements of the model, as the model’s basic components.

The plug-and-play property

A method that makes no use of the dprocess component is said to be “plug-and-play” or to have the “plug-and-play property”. At present, pomp is focused on such methods, so there is no reason to discuss the dprocess component further in this document. In the following, we will illustrate and explain how one specifies the rprocess, rmeasure, and dmeasure components of a model in pomp. We will illustrate this using some simple examples.

Preliminaries

Installing the package

To get started, we must install pomp, if it is not already installed. This package cannot yet be downloaded from CRAN (though that will change in the near future). However, the latest version is always available at the package homepage on Github. See the package website for installation instructions.

Important information for Windows and Mac users.

In this document, we will ultimately learn to implement POMP models using the package’s “C snippet” facility. This allows the user to write model components using snippets of C code, which is then compiled and linked into a running R session. This typically affords a manyfold increase in computation time. It is possible to avoid C snippets entirely by writing model components as R functions, and we will begin by doing so, but the speed-ups afforded by C snippets are typically too good to pass up. To use C snippets, you must be able to compile C codes. Compilers are not by default installed on Windows or Mac systems, so users of such systems must do a bit more work to make use of pomp’s facilities. The installation instructions on the package website give details.

Simulation of a POMP

Having dispensed with the preliminaries, we now explore some of the functionality provided by pomp. To assist the reader in following this exploration, the R codes for this document are available.

The latent state process

Let us see how to implement a very simple POMP model. In particular, let’s begin by focusing on the famous Ricker model (Ricker 1954), which posits a nonlinear relationship between the size, \(N(t)\), of a population in year \(t\) and its size, \(N(t+1)\), one year later: \[N(t+1)=r\,N(t)\,\exp\left(1-\frac{N(t)}{K}\right).\tag{1}\] Here, \(r\) and \(K\) are constant parameters, usually termed the intrinsic growth rate and the carrying capacity, respectively. As written, this is a deterministic model: it does not allow for any variability in the population dynamics. Let’s modify the Ricker equation by assuming that \(r\) is not constant, but instead has random variation from year to year. If we assume that the intrinsic growth rate varies from year to year as a lognormal random variable, we obtain \[N(t+1)=r\,N(t)\,\exp\left(1-\frac{N(t)}{K}+\varepsilon(t)\right),\tag{2}\] where \(\varepsilon(t)\sim\mathrm{Normal}(0,\sigma)\). Note that we’ve introduced a new parameter, \(\sigma\), which quantifies the intensity of the noise in the population dynamics. Ecologically speaking, Eq. 2 is a model with environmental stochasticity.

Typically, it is relatively straightforward to simulate a POMP model. To accomplish this in pomp, as we’ve already discussed, we specify the rprocess component of the model. We’ll also need to choose values for the model parameters, \(r\), \(K\), and \(\sigma\). We’ll also need to make a choice regarding the initial condition, \(N(0)\). The simplest choice is to treat \(N(0)=N_0\) as a parameter.

## Warning: 'rmeasure' unspecified: NAs generated.

Notice that we’ve specified the rinit and rprocess components of the model as functions. These functions take as arguments the relevant variables (whether these are state variables or parameters). Importantly, they return named numeric vectors. Names of variables and parameters are very important in pomp. Notice too that we’ve used the discrete_time function, which encodes the fact that our Ricker model is a discrete-time stochastic process (a Markov chain). The first argument of discrete_time is an R function encoding Eq. 2; the second argument specifies the discrete time-step.

Note also that the parameters are furnished in the form of a named vector, and that we’ve specified both t0 and times. The former is the initial time, \(t_0\), i.e., the time at which the initial conditions apply. Since our initial condition is that \(N(0)=N_0\), our initial time is \(t_0=0\). The times argument specifies the observation times \(t_1,\dots,t_N\).

Finally, note that we received a warning about values being generated. We will soon see what this is about.

What sort of an object is sim1? If we print it, we see

## <object of class 'pomp'>

sim1 is evidently an object of class ‘pomp’. We refer to these as ‘pomp’ objects.

To get more insight into the structure of sim1, we can use spy:

pomp provides methods for plotting ‘pomp’ objects. For example,

We can also recast a ‘pomp’ object as a data frame:

time N
1 148.3456
2 274.8027
3 225.5831
4 209.7361
5 248.0677
6 276.0464
7 196.4977
8 248.0970
9 214.0114
10 207.6063
11 258.5272
12 245.7282
13 180.2086
14 234.7697
15 244.2032
16 268.4648
17 199.6789
18 211.7116
19 254.1603
20 222.1691

Casting the ‘pomp’ object as a data frame allows us to use ggplot2 graphics:

Tp return to the warning we got when we ran simulate: it was telling us that simulate could not make a random draw from the measurement process because we had not supplied it with any information about this process. In particular, we had not supplied a measurement-model simulator. Let’s now see how to specify the measurement-model simulator, or rmeasure.

The measurement model

Let’s suppose that non-lethal traps are used to sample the population to determine its size. Each year, some number of traps are set out and \(Y_t\) is the number of animals captured. We might posit \[Y_t\;\sim\;\mathrm{Poisson}(b\,N(t)),\] where the parameter \(b\) is proportional to sampling effort, e.g., the number of traps. This is a measurement model, and we can implement a simulator for it by specifying another function:

Note that, again, the rmeasure function need take only the necessary arguments (and ...) and must return a named numeric vector.

Now, in the preceding chunk of code where we construct sim2, there was some redundancy with our earlier construction of sim1. In particular, we specified the same values of t0, times, rinit, and rprocess as before. Since these specifications were stored in sim1, however, we could have simply added in just the new pieces. For example,

As before, we can examine our handiwork:

time Y N
1 11 124.0831
2 13 200.2960
3 7 267.4025
4 8 190.2190
5 12 195.8285
6 15 293.0790
7 14 216.3693
8 10 217.2981
9 12 269.7404
10 16 223.0512
11 11 200.4050
12 18 255.2297
13 12 231.3550
14 12 197.5631
15 14 263.8953
16 14 277.9194
17 15 197.6269
18 14 292.5811
19 13 250.8353
20 12 206.4404

Notice that now our simulate call produces samples from both the \(N\) and the \(Y\) distributions. simulate will try to sample from the joint distribution of latent states and observables, but will sample from just the latent state process if the rmeasure component is undefined.

Note on pomp syntax

pomp uses certain syntactic conventions. For example, if x is an object and f is a pomp function, then typically f(x,...) -> y yields an object, y, that contains x plus the results of the f operation, together with additional information pertinent to what f did. Here, ... stands for additional arguments to f. Then, if we call f(y), this operation will perform some kind of repeat of the original operation: though the details will vary with f and y, choices made in the original computation (on x) will typically be respected. To change some of these choices, one can do f(y,...), where the ... stands for options of f that correspond to the new choices.

Morever, since y contains so much information, this information will typically be reused when we apply a different function, g, to y, when it makes sense to do so.

As of version 4.1 R provides a new pipe syntax that is especially convenient for us. It allows us to construct pipelines of R commands, which leads (once one is used to it) to code that is easier to read. In brief, a command in ordinary R style, such as

h(g(f(x,...),...),...)

where f, g, h are R functions, x is some R object, and ... represents additional arguments to each of the functions, must be read from the inside out, which is at odds with the sequential nature of the fgh computation. An alternative, of course, is to write something like

y <- f(x,...)
z <- g(y,...)
w <- h(z,...)

which multiplies the entities (y, z, w) one must name and keep track of. In contrast, the new pipeline syntax of R allows us to write

x |> f(...) |> g(...) |> h(...)

The object-oriented structure of pomp makes this kind of programming quite natural.

Some experienced R programmers find the pipeline syntax uncomfortable or unnecessary at first, and debugging pipelined code requires a somewhat different approach. Of course, it is not necessary to adopt this style of programming to use pomp, but it is quite natural and, experience shows, quite addictive!

We will use pipelines freely for the remainder of the document.

Important note: If you opt to load the tidyverse (or individual packages therein), be sure to load pomp after loading these packages. There are some name conflicts between the packages that would otherwise cause pomp functions to be masked.


Exercise

Modify the sim2 object constructed above to change the measurement model. In particular, assume that \[Y_t\;\sim\;\mathrm{NegBin}(b\,N(t),\theta),\] where \(b\) and \(\theta\) are parameters. (By this notation, we understand that \(Y_t\) is negative binomially distributed with mean \(b\,N(t)\) and variance \(b\,N(t)+(b\,N(t))^2/\theta\).) Make and plot some simulations for \(t=20,\dots,50\) with different values of the \(\theta\) parameter. [Hint: use the rnbinom function with the mu, size parameterization.]


Parus major population abundance data

We will illustrate some of the pomp data-analysis algorithms by performing a limited analysis on a set of bird abundance data. The data are from annual censuses of a population of Parus major (Great Tit) in Wytham Wood, Oxfordshire. They were retrieved as dataset #10163 from the Global Population Dynamics Database version 2 (NERC Centre for Population Biology, Imperial College, 2010). The original source is McCleery and Perrins (1991). They are provided as part of the package, in the data frame parus. Here we display the data graphically and in a table:

year pop
1960 148
1961 258
1962 185
1963 170
1964 267
1965 239
1966 196
1967 132
1968 167
1969 186
1970 128
1971 227
1972 174
1973 177
1974 137
1975 172
1976 119
1977 226
1978 166
1979 161
1980 199
1981 306
1982 206
1983 350
1984 214
1985 175
1986 211

The basic data-type provided by pomp—the ‘pomp’ object—is designed as a container for a model and data. Let’s construct such an object with these data, together with the Ricker model we’ve already implemented. We do this with a call to the constructor function pomp:

Notice that the measurement model simulator (rmeasure) is not the same as before, reflecting the fact that the data are named pop, not Y as before. Also notice that we specify the time variable by name, thus allowing the data to dictate the observation times. The t0 argument specifies that the stochastic latent state process is initialized in year 1960.

Continuous-time process models

We’ve already shown how to implement a discrete-time model; let’s now see how to implement a continuous-time model. We’ll implement a very simple model of stable but stochastic population dynamics, the logistic, or Verhulst-Pearl, equation with environmental stochasticity. We’ll write this model as a stochastic differential equation (SDE), specifically an Itô diffusion: \[dN = r\,N\,\left(1-\frac{N}{K}\right)\,dt+\sigma\,N\,dW(t),\tag{3}\] where \(N\) is the population size, \(r\) is a fixed rate, the so-called “Malthusian parameter”, \(K\) is the population’s “carrying capacity”, \(\sigma\) describes the intensity of extrinsic stochastic forcing on the system, and \(dW\) is an increment of a standard Wiener process. [Those unfamiliar with Wiener processes and Itô diffusions will not go far wrong thinking of \(dW(t)\), for each time \(t\), as a normal random variable with mean zero and standard deviation \(\sqrt{dt}\).] To implement this model in pomp, we must tell the package how to simulate this model. The easiest way to simulate such an SDE is via the Euler-Maruyama method. In this approximation, we take a large number of very small steps, each of duration \(\Delta t\). At each step, we hold the right-hand side of the above equation constant, compute \(\Delta N\) using that equation, and increment \(N\) accordingly. pomp gives us the euler function to assist us in implementing the Euler-Maruyama method. To use it, we must encode the computations that take a single step. As before, we can do so by writing a function.

This function computes the value of \(N(t+\Delta t)\) given the value of \(N(t)\), \(\Delta t\), and the parameters.

We fold this with the data into a ‘pomp’ object via a call to the ‘pomp’-object constructor function, pomp. We’ll also include the same measurement model we used before. Because everything but the ‘rprocess’ component is the same as with the Ricker model, to accomplish this, we can simply do

Notice that we’ve specified an Euler-Maruyama step of about 1 day: it will take 365 of these steps to get us from one observation to the next.

As before, we can plot this ‘pomp’ object, recast it as a data frame, and simulate it for any given choice of the parameters.


Exercise

The following codes produce several simulations for parameters \(r=0.5\), \(K=2000\), \(\sigma=0.1\) and \(b=0.1\), and plot them on the same axes as the data. Notice that the format="data.frame" and include.data=TRUE options facilitate this. Vary the parameters to try to achieve a better fit to the data, as judged purely “by eye”.


C snippets

To this point, we’ve implemented two models by specifying their basic components as R functions. While this has the virtue of transparency, it puts severe constraints on computational performance due to intrinsic limits in the speed with which R codes can be interpreted. If all we want to do is run a few simulations, this is not a problem. As we’ll see, however, in attempting to perform parameter estimation and other inferences, we will need all the speed we can readily get. We achieve potentially massive speed-ups by implementing our basic model components in a language that can be compiled. pomp makes this easy. The key innovation is the C snippet. Let’s see how to code up the Ricker and Verhulst-Pearl models using C snippets. First, we’ll code up the rinit and rmeasure components, which the two models share.

These are about as simple as one can get. As the name suggests, each is just a snippet of C code: not all variables are declared (in fact, in these examples, none are) and the context of the snippet is not specified. In particular, these snippets are not actually complete C functions, so by themselves, they cannot be compiled. They must not actually violate the rules of C syntax however. Among other things, lines must end with a semicolon (;), variable names must respect C restrictions, etc. Although one can enclose essentially arbitrarily complex C code in a C snippet, one can do quite a lot with very simple snippets. If you are new (even very new) to C, don’t worry: once you master a few simple rules, you’ll be able to code up C snippets just as easily, or even more easily, than you code up R functions and you’ll learn to value the resulting speed-up extremely.

Now we’ll code the Ricker and Verhulst-Pearl simulation steps as C snippets.

A few observations are in order. First, note that the local variables eps and dW are declared “double”, the standard C data-type for floating point numbers. Observe that dt is a variable that is defined in the context of the vpstepC snippet; when this snippet is executed, dt will be provided by pomp and will be equal to the size of the Euler step actually being taken. Note also that in each of these snippets, the value of N gets over-written by its new value at the next time-step. This is the goal of the C snippets we supply to specify the ‘rprocess’ component of a model. Finally, notice that neither the state variable N nor any of the parameters are declared. The declarations will be handled in a different way, as we’ll see in a moment.

When furnished with one or more C snippets, pomp will provide the necessary declarations and context, compile the resulting C code, dynamically link to it, and use it whenever the corresponding basic model component is needed. We cause all this to happen when we construct an object of class pomp via a call to the constructor function. Let’s do this for the two models now.

In these calls, we use the statenames and paramnames arguments to indicate which of the undeclared variables in the C snippets rickstepC and vpstepC are state variables and which are fixed parameters. Since dW and eps are declared as local variables within the C snippets themselves, we don’t need to mention them here. The rnorm and rpois functions are part of the R API: see the manual on “Writing R Extensions” for a description of these and the other distribution functions provided as part of the R API. A full set of rules for writing pomp C snippets is given in the package help (?Csnippet).


Exercise

Using the ‘pomp’ objects just constructed, explore model simulations at a variety of different parameters. As before, plot simulations and data on the same axes for comparison purposes.


Exercise

Write a C snippet implementing the negative binomial measurement model you explored previously. Fold it into the ‘pomp’ objects just constructed. Remember, there is no need to re-specify components you have already specified: by calling pomp on a ‘pomp’ object you can modify some or all of the basic model components. Plot simulations and data on the same axes, as in the immediately preceding exercise.


The dmeasure component

As mentioned in the introduction, the dmeasure component is the other side of the rmeasure component. The latter simulates the measurement model whereas the former evaluates the measurement model’s probability density function. The following C snippet encodes the dmeasure component.

Here, dpois again comes from the R API. It takes three arguments, the datum (pop), the Poisson parameter (b*N), and give_log. When give_log=0, dpois returns the Poisson likelihood; when give_log=1, dpois returns the log of this likelihood. When this snippet is executed, pomp will provide the value of give_log according to its needs. It is the user’s responsibility to make sure that the correct value is returned for both possible values of give_log. This is one of the most common places where newbies make mistakes!


Exercise

Write the dmeasure component for your negative binomial model both as an R function and as a C snippet.


The particle filter

We are now in a position to be able to compute the likelihood of the data given any set of parameters for either of our models. For this purpose, we use the particle filter. This powerful algorithm is at the heart of several of pomp’s inference methods. We won’t describe the theory of the particle filter here. The tutorial by Arulampalam, Maskell, Gordon, and Clapp (2002) explains the theory in an accessible way. The pomp Journal of Statistical Software paper gives pseudocode and some examples. The pomp documentation page lists several other tutorial documents that go into more detail.

In pomp, the simplest version of the particle filter is implemented in the function pfilter. Its only required arguments are the ‘pomp’ object and the number of particles, i.e., the Monte Carlo sample size.

Notice that, in this call, we specified the dmeasure component using the C snippet we wrote above. What would have happened had we not specified this?

Notice that, because we here introduced a new C snippet, we again had to indicate which of the undeclared variables in dmeas are parameters and which are latent state variables.

What kind of object is pfrick?

## <object of class 'pfilterd_pomp'>

As a ‘pfilterd_pomp’ object, pfrick contains rickC plus a wealth of information regarding the particle filter operation that created it. For example, we can extract the estimated log likelihood at these (arbitrarily chosen) parameters:

## [1] -148.5748

There is also a plot method for ‘pfilterd_pomp’ objects and one for coercing them to data frames.

year pop ess cond.logLik
1960 148 1000.00000 -3.879800
1961 258 276.91478 -5.329129
1962 185 245.07346 -5.244873
1963 170 183.89624 -5.551510
1964 267 245.35009 -5.443870
1965 239 291.78302 -5.235775
1966 196 274.13378 -5.152608
1967 132 66.23966 -6.545454
1968 167 225.27473 -5.331978
1969 186 249.96622 -5.263931
1970 128 56.39030 -6.692253
1971 227 302.62524 -5.174603
1972 174 202.36717 -5.441043
1973 177 211.26047 -5.369764
1974 137 83.00157 -6.269213
1975 172 238.82351 -5.239950
1976 119 45.15378 -6.853124
1977 226 299.48172 -5.172145
1978 166 184.60553 -5.521961
1979 161 146.71626 -5.714835
1980 199 269.30266 -5.230779
1981 306 193.94797 -5.754881
1982 206 313.84250 -5.057105
1983 350 117.75799 -6.321934
1984 214 296.00247 -5.170972
1985 175 203.08438 -5.449372
1986 211 293.20228 -5.161943

Both of these reveal that pfrick contains information about the effective sample size of the particle filter (ess) and the conditional log likelihood, cond.logLik, which in the notation of the introduction, is \[\log f_{Y_n|Y_{1:n-1}}(y_n^*|y_{1:n-1}^*).\]

The particle filter is a Monte Carlo algorithm. Accordingly, it gives us only a noisy estimate of the likelihood. We can reduce this noise by increasing the number of particles, Np, and we can estimate the magnitude of the Monte Carlo error by running a few independent particle filters. For example:

##  [1] -148.4803 -148.5014 -148.8655 -148.6632 -149.2521 -148.1796 -148.0358
##  [8] -148.5363 -148.4578 -148.5291
##                        se 
## -148.5019709    0.1004082

In the first line, notice that we did not need to specify Np, despite the fact that there is no default value of this parameter. Indeed, because pfrick is a ‘pfilterd_pomp’ object, it knows the value of Np that was used in its own creation. By default, this same value is used again when it is passed to pfilter. We could, of course, have used a different value simply by specifying Np in this call to pfilter.

The last line of the preceding code chunk computes the log of the mean of the estimated likelihoods and the standard error on this mean via the delta method. Since the particle filter gives an unbiased estimate of the likelihood (not the log likelihood), this operation is sensible, provided the Monte Carlo error is not too large.

Let’s repeat the operation for the Verhulst-Pearl model, again at arbitrary parameters.


Exercise

Compute the likelihood for the parameters you found in your attempt to estimate parameters “by eye”.


Trajectory matching

Trajectory matching is the method of estimating the parameters of a deterministic model by fitting the model to data assuming independent measurement errors. Although pomp’s main focus is on stochastic models, it does provide facilities for trajectory matching. pomp makes a conceptual distinction between the stochastic process and a deterministic skeleton, which we can view as a deterministic model related to the stochastic process’ central tendency. We’ll not go into mathematical details here: instead, we’ll illustrate with two examples.

A discrete-time deterministic skeleton

A deterministic skeleton of the stochastic Ricker model is the Ricker map, Eq. 1. We implement this for pomp, again either as an R function or a C snippet and pass it to pomp functions via the skeleton argument. For example:

Here, the left-hand side of Eq. 1 is indicated by the D prefix: in skeleton snippets, we don’t over-write the state variable N. We indicate that the skeleton is a discrete-time map using the map function.

A continuous-time deterministic skeleton

A deterministic skeleton of the Verhulst-Pearl model is the vectorfield (or ordinary differential equation), \[\frac{dN}{dt} = r\,N\,\left(1-\frac{N}{K}\right).\] We fold this into the vpC ‘pomp’ object so:

Since the skeleton here is a vectorfield, in this C snippet, DN is filled with the value of the time-derivative of N. See the package help (?Csnippet) for a complete set of rules for writing C snippets.

Trajectories of the deterministic skeleton

With the deterministic skeleton in place we can generate trajectories of the skeleton using trajectory. For example:

As with simulate, one can use trajectory to compute multiple trajectories at once, for varying values of the parameters.

Parameter estimation using trajectory matching

In pomp, the function traj_objfun constructs an objective function quantifying the mismatch between model predictions and data. For this purpose, it uses the dmeasure component of the model. This function can be given to any of the large variety of numerical optimizers available in R and R packages. These optimizers search parameter space to find parameters under which the likelihood of the data, given a trajectory of the deterministic skeleton, is maximized.

We’ll demonstrate using the Verhulst-Pearl model.

This invocation of traj_objfun creates an objective function, ofun, that can be used to estimate the three parameters \(K\) and \(N_0\). It will hold the other three parameters, \(r\), \(\sigma\), and \(b\), fixed at the values they are given in params.

Notice that, in this code chunk, we had to specify dmeasure once again. Why? What would have happened had we not done so?

What kind of object is ofun?

## <object of class 'traj_match_objfun'>

‘traj_match_objfun’ objects, like the objective functions created by the pomp functions spect_objfun, probe_objfun, and nlf_objfun, is an R function, but it is also more than an R function. It contains not only the vpC ‘pomp’ object, but it additionally saves information each time it is evaluated. Let’s see how such stateful objective functions make it easy to use a wide range of numerical optimization routines.

We can evaluate ofun at any point in the 2-dimensional \(K\)-\(N_0\) space. For example:

## [1] 1161.543

The value returned by ofun is the negative log likelihood, as returned by the model’s dmeasure component. We can estimate the parameters using, for example, the subplex algorithm implemented in the subplex package:

## $par
## [1] 1968.474 1895.245
## 
## $value
## [1] 276.2893
## 
## $counts
## [1] 318
## 
## $convergence
## [1] 0
## 
## $message
## [1] "success! tolerance satisfied"
## 
## $hessian
## NULL

Note that fit contains, among other things, the estimated parameters (element par) and the minimized value of the negative log likelihood (value).

To make absolutely certain that ofun remembers these estimates, we evaluate it once at fit$par:

## [1] 276.2893
##        r        K    sigma        b      N_0 
##    0.500 1968.474    0.100    0.100 1895.245
## [1] -276.2893

Then we can, for example, extract the fitted trajectory thus:

We can superimpose the model predictions on the data as follows.

Parameter transformations

Very commonly, model parameters must obey certain constraints. For example, the parameters in the two models we’ve looked at so far are all constrained to be positive. In estimating parameters, however, one frequently wants to employ a numerical optimization method that does not respect constraints. One way of accomodating such unconstrained optimizers is to transform the parameter space so that the constraints disappear. For example, by log-transforming the \(r\) parameter in the Verhulst-Pearl model (Eq. 3), one obtains the superficially different equation \[dN = e^\rho\,N\,\left(1-\frac{N}{K}\right)\,dt+\sigma\,N\,dW(t),\] where \(\rho=\log{r}\). In this model, \(\rho\) can take any (positive or negative) values while \(r\) remains positive.

We incorporate parameter transformations using the partrans argument to many pomp functions, specifying them using the parameter_trans function. In general, the parameter transformations, like other basic model components, can be supplied using R functions or C snippets. If we are merely log-transforming, logit-transforming, or log-barycentric-transforming parameters, however, it is even easier. The following code chunk implements log transformation of the all the parameters of the Verhulst-Pearl and Ricker models.

We could then provide vpr_partrans as needed to any of the various pomp inference methods, via the partrans argument. For example, to estimate parameters for the Verhulst-Pearl model on the transformed scale, we do

## [1] 276.2893
##             r             K         sigma             b           N_0 
##    0.50000000 2506.85237865    0.10000000    0.07852373 2413.59563164

Exercise

Estimate the parameters for the Verhulst-Pearl model assuming negative binomial errors. Do not attempt, at first, to estimate all parameters simultaneously. Focus on estimating \(K\), \(N_0\) and the parameters of the measurement model. It will probably be helpful to make use of parameter transformations to enforce the model constraints.


Exercise

Estimate some or all of the parameters of the Ricker model using trajectory matching. It is probably a good idea to use parameter transformations.


Maximizing the likelihood by iterated filtering

Let us now turn to the main focus of the pomp package: parameter estimation for fully stochastic models. Iterated filtering is a method for maximizing the likelihood. The method was introduced in its original form by Ionides, Bretó, and King (2006), and was subsequently much improved by Ionides, Nguyen, Atchadé, Stoev, and King (2015). The latter paper rigorously expounds the theory. An addendum to the pomp *Journal of Statistical Software paper provides pseudocode and a simple example. Finally, a tutorial on the theory and practice is linked from the pomp documentation index. Here, we confine ourselves to demonstrating how the IF2 algorithm (Ionides et al. 2015) is applied to the toy examples we have been discussing.

Estimating the log likelihood

Now, although we have observed the intended improvement in the log likelihood, we should be careful to note that the log likelihood displayed in this plot is the log likelihood of the perturbed model. This model differs from the one we are interested in. To compute the likelihood of our focal model at the parameter returned by mif2, we need to perform a few particle filter operations:

##                        se 
## -143.9880041    0.1035849

Note regarding parallelization

The above foreach call, as written, will result in serial, not parallel, execution of the 100 computations. To actually parallelize, one must register a parallel backend. A good first choice is doParallel, which is a separate package that you must explicitly load. The following chunk of code shows how to do this, and prints the number of “workers” that result.

doParallel allows one to exploit multiple cores on a single machine, or to expand a parallel computation across a cluster.


Exercise

Use doParallel to parallelize the estimation of likelihood for your negative-binomial model.


More search effort

At this point, there’s no particular reason to suspect that our IF2 searches have arrived at their destination. In general, it is hard to know a priori how much effort will be required to find the MLE. Let’s continue the search, starting with the best points we’ve uncovered so far.

Note that, in the above, we’ve performed a total of 100 mif2 iterations per starting point. The code above also does the post-mif2 likelihood estimation. It returns just the parameter and likelihood estimates.

We can combine the new estimates with the old ones into a general database:

r K sigma N_0 b loglik loglik.se
se910 3.1284977 205.8674 0.6017564 148.7116 1 -141.4642 0.1252795
se68 3.6836844 212.4330 0.6866914 150.3687 1 -141.5363 0.1009751
se501 2.1242664 210.8481 0.4947006 149.6113 1 -141.5978 0.1683296
se57 2.7815993 212.5501 0.5903914 147.4731 1 -141.6073 0.1122423
se21 5.8818882 212.1286 0.7922236 146.5375 1 -141.6152 0.2477002
se531 3.2739549 214.8133 0.6221768 147.2093 1 -141.6218 0.1401377
se101 2.9895684 210.7216 0.6213515 151.8311 1 -141.6329 0.1650331
se351 4.2896991 213.7109 0.7332631 144.9341 1 -141.7211 0.2002205
se171 4.0364387 211.8947 0.6618965 152.0286 1 -141.7507 0.1958240
se88 3.0209499 216.4512 0.5807363 146.8939 1 -141.7678 0.1549183
se471 4.1108053 211.5013 0.6765392 150.1728 1 -141.7688 0.1915084
se33 2.3271085 211.7919 0.5353136 145.0449 1 -141.7770 0.1775999
se281 3.2229005 212.7592 0.6206668 141.8785 1 -141.7926 0.1718247
se32 3.0347558 214.5267 0.6105621 151.6241 1 -141.8097 0.0736884
se851 2.0603238 212.6789 0.4920679 153.4866 1 -141.8165 0.2027171
se64 4.3675025 217.5551 0.6963499 149.3324 1 -141.8173 0.1738572
se27 4.1651880 212.2039 0.6695203 151.1753 1 -141.8247 0.1801505
se85 3.1618849 217.2689 0.6285643 147.6559 1 -141.8536 0.1126640
se901 1.9808127 216.2770 0.4867885 145.2720 1 -141.8623 0.1030174
se510 3.7246664 211.8903 0.6183127 147.5523 1 -141.8710 0.1594258
se521 4.6876312 216.4202 0.7389916 146.6980 1 -141.8736 0.2505318
se5 3.4430533 211.7797 0.6035404 144.1598 1 -141.8747 0.1598059
se410 2.1976481 216.3994 0.5181651 147.0187 1 -141.8826 0.1948529
se401 2.5064845 210.8978 0.5063740 150.2220 1 -141.8894 0.1728310
se37 2.1808811 207.7584 0.4807931 141.7614 1 -141.8966 0.2286207
se131 5.1484677 210.4429 0.7899178 144.3239 1 -141.8974 0.1460223
se221 2.7833339 213.0620 0.5873151 146.4354 1 -141.9072 0.1512947
se231 3.7572544 216.1443 0.6519619 150.1044 1 -141.9139 0.2052505
se871 3.4976192 218.3220 0.6362724 144.4875 1 -141.9233 0.2324237
se291 3.8456281 209.9777 0.6860800 144.7479 1 -141.9271 0.3207481
se771 4.1157821 215.8445 0.6910520 147.6352 1 -141.9277 0.1539120
se48 6.0016710 213.2330 0.8234306 149.1209 1 -141.9305 0.1337823
se741 1.8741633 217.3076 0.4577219 154.2762 1 -141.9406 0.1423198
se50 4.6644050 212.4280 0.7036091 143.8320 1 -141.9410 0.1437409
se90 4.4312766 212.1907 0.6620579 148.5293 1 -141.9435 0.2071402
se391 9.8615639 213.9890 1.0797859 138.9437 1 -141.9463 0.1200813
se801 3.8205724 210.6642 0.6580608 150.2490 1 -141.9500 0.1053545
se80 4.4608459 217.0975 0.7432218 147.8423 1 -141.9624 0.1261549
se28 6.6624655 213.7130 0.8609550 142.8872 1 -141.9668 0.0661287
se110 4.0725999 212.2438 0.6651480 147.1641 1 -141.9723 0.0981206
se561 2.3707522 208.3692 0.5052141 145.5224 1 -141.9749 0.1205260
se100 2.3852502 210.5481 0.5409204 154.6998 1 -141.9791 0.0833558
se461 3.1669918 214.7740 0.6460144 147.6726 1 -141.9811 0.1696955
se19 5.8047778 210.3349 0.8879165 149.4971 1 -141.9912 0.1118307
se99 4.8518371 220.3252 0.7521458 148.8792 1 -141.9926 0.2007112
se971 5.1198225 214.3683 0.8236767 148.6841 1 -141.9930 0.1245734
se111 4.3874250 209.4525 0.7177486 139.8632 1 -141.9949 0.2023475
se29 5.0952391 208.9361 0.7685322 147.4569 1 -141.9971 0.2633872
se861 5.5352633 213.8764 0.7805744 151.7866 1 -142.0083 0.0880137
se74 1.8407718 207.9801 0.4990658 149.2076 1 -142.0126 0.1439216
se610 2.4085847 213.9201 0.5319374 145.5058 1 -142.0150 0.1398764
se301 4.6886572 203.7468 0.7245849 143.9941 1 -142.0171 0.2440018
se431 2.0495836 212.1069 0.5046400 146.8699 1 -142.0183 0.2122916
se89 2.9882273 212.5430 0.5523072 150.6618 1 -142.0184 0.1451123
se15 4.9341882 216.3921 0.7523905 146.7809 1 -142.0265 0.1272591
se210 6.0634546 216.3534 0.8230841 156.0111 1 -142.0301 0.1741935
se151 5.6200519 218.0458 0.7836643 145.6312 1 -142.0451 0.1528252
se78 2.4784821 212.1743 0.5468547 148.5761 1 -142.0458 0.1208535
se201 3.9895841 217.5986 0.6900153 149.1241 1 -142.0511 0.1856359
se13 1.9756218 220.0869 0.4962517 148.5643 1 -142.0544 0.1486778
se70 2.2689645 212.6579 0.5388237 141.7237 1 -142.0550 0.1234744
se87 1.5679569 215.9923 0.4854862 148.7885 1 -142.0579 0.0716005
se12 7.0888472 213.1705 0.8194034 149.5944 1 -142.0604 0.1869817
se97 5.3971637 214.1662 0.8037923 153.3908 1 -142.0606 0.0824913
se511 2.6064530 214.7371 0.6031710 144.0756 1 -142.0621 0.1395957
se45 3.6336642 219.9406 0.6577585 148.8501 1 -142.0631 0.1517919
se271 5.7154054 215.3635 0.8058923 149.6875 1 -142.0634 0.0794653
se731 3.2941732 211.9734 0.6391056 152.7944 1 -142.0637 0.1309472
se77 3.6827196 215.6859 0.6132396 145.9564 1 -142.0647 0.2486038
se75 3.1895308 217.8538 0.6461034 150.1621 1 -142.0679 0.1839064
se310 2.5951523 214.8603 0.5265493 142.2373 1 -142.0686 0.2316993
se 2.9959552 218.1808 0.6250979 150.5698 1 -142.0892 0.1002993
se54 6.3047576 218.7917 0.8593284 149.6140 1 -142.0918 0.1550471
se811 7.2212227 216.8687 0.9897709 150.5406 1 -142.0987 0.1101978
se371 5.9047985 211.2022 0.7628310 151.1124 1 -142.0997 0.2025309
se621 3.4188381 211.6933 0.6831583 144.0670 1 -142.1023 0.2001714
se781 1.9466807 215.2372 0.5261953 148.9087 1 -142.1053 0.1390146
se181 5.4144235 217.5064 0.7680975 150.3910 1 -142.1055 0.0382852
se91 3.2898830 209.6906 0.6383630 146.2413 1 -142.1090 0.1243632
se381 7.2197462 220.0683 0.8770334 150.4797 1 -142.1091 0.1104086
se94 4.0563201 217.8347 0.7131565 150.8476 1 -142.1135 0.1976744
se211 1.7229944 203.3596 0.4533934 149.1842 1 -142.1145 0.2154220
se10 3.1849554 219.0177 0.6718997 151.4015 1 -142.1162 0.1326665
se691 4.4722466 214.3339 0.8258454 148.5493 1 -142.1240 0.2125286
se571 9.3001200 218.8631 0.9928747 144.1758 1 -142.1249 0.1988678
se551 2.8167334 213.1091 0.6058689 146.1872 1 -142.1311 0.1359119
se95 3.4928000 213.1982 0.6183225 153.9883 1 -142.1351 0.1755830
se43 4.0442732 212.9845 0.6941041 139.2189 1 -142.1378 0.1200866
se491 2.1462619 213.6473 0.5366153 144.6799 1 -142.1433 0.0937312
se40 4.3801499 218.6479 0.7064160 155.6226 1 -142.1472 0.2755120
se141 12.1093455 217.9099 1.1801442 152.4030 1 -142.1474 0.0871457
se34 1.8514751 222.4401 0.4644043 154.0264 1 -142.1494 0.1322392
se52 2.6283871 214.7525 0.5242823 151.5491 1 -142.1505 0.1469142
se79 3.0264315 217.5069 0.6026777 149.4249 1 -142.1554 0.1741575
se411 2.5709698 211.1520 0.5616349 153.9194 1 -142.1567 0.1142745
se341 2.3679149 205.5358 0.4836869 145.2114 1 -142.1604 0.2407589
se53 2.7674680 209.1806 0.5241788 148.7800 1 -142.1669 0.1819148
se821 6.7741092 214.1129 0.8523516 151.7854 1 -142.1726 0.2200961
se36 6.4546193 212.5066 0.9509408 142.2344 1 -142.1861 0.2080422
se881 2.2984091 208.2815 0.5602557 152.0295 1 -142.1930 0.1980915
se331 2.4999546 219.7766 0.5315375 147.5725 1 -142.1937 0.1961146
se761 1.4197868 206.6957 0.4222551 151.5979 1 -142.1946 0.1231460
se831 1.8152579 211.5318 0.4546590 143.8816 1 -142.1977 0.0560870
se361 1.3339196 210.2685 0.4098971 146.2012 1 -142.1978 0.1359682
se16 1.4332292 209.3028 0.4507419 152.9967 1 -142.2001 0.2294799
se241 2.3030114 209.3947 0.5374609 155.7681 1 -142.2250 0.1472330
se941 2.0566538 212.8740 0.5363679 146.1621 1 -142.2251 0.1075507
se67 2.9843188 221.3515 0.6351193 143.8641 1 -142.2268 0.3945456
se55 4.7541970 220.1777 0.7484075 147.9699 1 -142.2277 0.1859566
se39 3.1750824 218.3942 0.5905681 153.2271 1 -142.2323 0.1404262
se611 1.7516138 210.7420 0.4761171 150.0256 1 -142.2327 0.1676586
se321 2.4648818 220.8792 0.5847825 154.3920 1 -142.2424 0.1048529
se25 5.5786179 216.7155 0.8819837 148.8074 1 -142.2450 0.2529216
se671 1.6878280 220.7431 0.4458878 144.7328 1 -142.2495 0.1627279
se23 7.3651987 205.7602 0.9150536 155.8937 1 -142.2559 0.3019883
se751 2.8957658 219.7361 0.5863087 149.2402 1 -142.2570 0.1408042
se931 5.7507380 209.3158 0.8523783 151.6613 1 -142.2631 0.1350489
se11 2.8600067 222.7872 0.5784979 149.7242 1 -142.2660 0.1389689
se891 4.5165948 205.2885 0.6607846 153.1670 1 -142.2700 0.4565496
se93 4.2933442 217.7463 0.7261457 159.1221 1 -142.2758 0.0559741
se651 11.7960567 219.2692 1.1459080 151.9644 1 -142.2777 0.1386404
se251 2.0268740 220.4528 0.5176661 151.7884 1 -142.2858 0.0428206
se311 2.0765488 221.4920 0.4904718 152.8537 1 -142.2926 0.2087475
se73 2.0638341 216.3202 0.4698665 159.9376 1 -142.3016 0.1161037
se41 3.7834826 213.9802 0.7134380 154.3169 1 -142.3027 0.1233489
se14 5.1006870 219.9448 0.7690559 145.7037 1 -142.3049 0.0775627
se65 1.3928353 220.3395 0.4360945 146.7507 1 -142.3109 0.1801548
se1 5.1336944 221.3607 0.8224853 151.0915 1 -142.3265 0.1312451
se841 1.3028994 215.8475 0.3952368 149.5034 1 -142.3314 0.1341823
se30 1.3890788 219.9610 0.4256972 152.8321 1 -142.3354 0.1851503
se66 1.6599215 217.3594 0.4728207 148.9763 1 -142.3419 0.1589647
se51 1.8250592 219.3148 0.4616111 148.7297 1 -142.3537 0.1159887
se22 2.7907135 222.2466 0.6137075 149.3764 1 -142.3638 0.1711753
se161 1.8437644 218.8977 0.4360730 144.9611 1 -142.3653 0.1207717
se581 1.9555691 217.9152 0.5036184 155.5379 1 -142.3713 0.1401370
se601 6.5884019 218.8768 0.8310974 140.1405 1 -142.3731 0.1630810
se121 2.9848595 221.3813 0.6434883 143.7783 1 -142.3776 0.1604775
se481 9.6856744 216.6113 1.0783609 142.4744 1 -142.3810 0.1846644
se7 1.2903030 212.8061 0.4288905 151.0575 1 -142.4250 0.0614888
se62 4.4699444 220.5051 0.6893357 142.7222 1 -142.4264 0.1531607
se721 2.2937505 223.2648 0.5153067 144.0662 1 -142.4321 0.1399298
se661 8.9102779 211.9594 0.9094235 150.0032 1 -142.4392 0.2056491
se98 2.4829217 221.8528 0.5386735 145.8692 1 -142.4488 0.2667045
se791 4.8083932 222.1192 0.7605053 152.9680 1 -142.4750 0.1619737
se92 2.7827761 223.2481 0.6043873 146.8671 1 -142.4768 0.1769846
se641 7.9895070 221.3033 0.9316942 151.9445 1 -142.4914 0.1829834
se69 1.2856743 218.0688 0.4452261 149.7738 1 -142.5097 0.1350066
se47 1.9567534 221.9479 0.4783042 151.4720 1 -142.5109 0.0980184
se810 3.1198680 216.8455 0.5375921 143.4047 1 -142.5250 0.1516736
se541 1.2035350 210.7323 0.4328262 148.3216 1 -142.5343 0.2490127
se961 3.4125578 206.7698 0.5482344 138.6612 1 -142.5406 0.2400610
se83 1.2029504 214.4985 0.3947243 145.8261 1 -142.5441 0.1347069
se591 3.4665245 220.2710 0.7150510 148.2169 1 -142.5626 0.0839920
se35 1.3658834 217.5841 0.4310899 152.6027 1 -142.5742 0.1295106
se58 1.4057486 224.5049 0.4253028 146.7249 1 -142.5823 0.1360865
se951 4.6285587 216.3087 0.8499224 148.4160 1 -142.5840 0.0895421
se6 1.1322125 216.9557 0.3919898 151.3209 1 -142.5867 0.2932362
se441 2.1351396 221.3982 0.4760466 149.4543 1 -142.5924 0.1205329
se710 2.1431614 218.3911 0.4718606 158.3669 1 -142.6212 0.1594888
se44 1.4517573 226.1810 0.4621305 147.2204 1 -142.6315 0.1187729
se42 1.2222893 221.7667 0.3810474 152.3657 1 -142.6656 0.0925501
se451 1.4068583 215.6308 0.4354047 156.8922 1 -142.6841 0.1793084
se84 1.9341479 226.0631 0.4814947 139.3399 1 -142.6941 0.1019777
se191 1.4737427 225.6758 0.4284394 148.5815 1 -142.6952 0.1298417
se63 3.6425148 225.8050 0.7233140 148.6470 1 -142.6974 0.1252213
se46 1.9020734 222.6830 0.4749938 151.6563 1 -142.7144 0.1125245
se711 3.6693562 223.6907 0.7380539 157.5776 1 -142.7164 0.1619572
se8 6.5525776 220.1277 0.8239225 158.6428 1 -142.7522 0.1209183
se17 1.6285714 221.6777 0.4729316 139.1833 1 -142.7724 0.1524665
se24 1.2444478 213.7785 0.3692251 157.1683 1 -142.8135 0.2046277
se61 1.0338581 221.1314 0.3579943 151.0629 1 -142.8454 0.1043648
se60 0.9326411 209.3584 0.3864530 151.1015 1 -142.8526 0.2257688
se3 1.4418048 217.7754 0.4895231 152.2804 1 -142.8610 0.0960146
se631 5.5179233 222.2521 0.7334308 147.8919 1 -142.9005 0.1279039
se261 1.0093882 214.2568 0.3630770 146.7379 1 -142.9088 0.0712823
se921 1.2843772 209.9653 0.3941288 142.4544 1 -142.9513 0.2070331
se59 7.3428620 224.4899 0.8891354 150.5340 1 -143.0071 0.0960857
se681 2.7675571 231.1546 0.5639122 152.6966 1 -143.0106 0.2647826
se76 0.9713643 214.1216 0.4011089 153.2541 1 -143.0869 0.1246783
se56 5.3797994 225.5354 0.7629686 158.5374 1 -143.1509 0.1884951
se9 0.9358958 226.2432 0.4090039 146.7092 1 -143.1698 0.1589043
se86 3.9458758 229.2706 0.6685891 142.7067 1 -143.2176 0.1422381
se701 10.5356122 197.5124 0.9121591 143.7085 1 -143.3303 0.1816937
se31 3.7880720 221.2428 0.5835092 140.7856 1 -143.3359 0.1746295
se2 0.8404929 212.6537 0.3966746 148.0877 1 -143.3973 0.1128334
se82 0.7749453 205.7566 0.3808468 154.5481 1 -143.6279 0.2328349
se20 0.7630085 220.4360 0.3642261 158.0709 1 -143.8688 0.1163206
se911 0.6972483 227.9625 0.3568215 149.4095 1 -143.9481 0.0516060
se96 0.3683433 207.4406 0.3305965 149.5746 1 -145.5778 0.1806662
se4 0.4153476 220.5704 0.3066363 156.1967 1 -145.5873 0.2335461
se81 3.0870348 211.2362 0.4059305 145.7000 1 -146.3165 0.3318164
se49 0.2992951 239.5948 0.3310973 154.6410 1 -146.4044 0.1551113
se18 0.2490440 317.7390 0.3226695 152.9649 1 -147.3309 0.1127292
se72 0.1249130 201.4041 0.3174692 156.5480 1 -148.3428 0.1586663
se421 0.0837456 290.9921 0.3114834 157.6023 1 -148.7578 0.1434920
se71 0.0423409 340.2362 0.3070124 159.2568 1 -149.1185 0.1429376
se38 5.3766487 204.2335 0.0183503 144.2809 1 -271.3662 0.0245138
se26 3.3312414 203.9319 0.0163923 153.3073 1 -271.4067 0.0154002

In this plot, we begin to see the emergence of structure in the likelihood surface. In particular, what looks like a ridge of high likelihood is visible in the \(r\)-\(\sigma\) projection. Such structures are very interesting in that they contain clues as to the manner in which the model is fitting the data. They can also pose challenges to efficient estimation, since climbing up to a ridge is harder than traversing it.

Likelihood profile

We can improve the quality of our estimates and obtain likelihood-ratio-test-based confidence intervals by constructing profile likelihoods. In a likelihood profile, one varies the focal parameter (or parameters) across some range, maximizing the likelihood over the remaining parameters at each value of the focal parameter. The following codes construct a likelihood profile over \(r\).

##                r        K     sigma      N_0 b
## [1,]  0.04234094 197.5124 0.3066363 138.6612 1
## [2,] 12.10934551 340.2362 1.1801442 159.9376 1
## [1] 1000    5

Notice that we’ve changed the mif2 perturbations (rw.sd): we’ve removed the perturbation on the \(r\) parameter, since we want to hold this parameter fixed.

We add these points to our database:

Next, we plot the likelihood profile. The following plot shows the top two estimates for each value of \(r\) with error bars showing ±2 s.e. and a loess smooth.

We see that the 95% likelihood ratio test confidence interval (CI) appears to be one-sided: the CI is roughly \(r>0.75\).

If we plot the profile trace of \(\sigma\), we see that, as we increase \(r\), we have to increase the intensity of the environmental stochasticity to maintain a good fit. Why is this?


Exercise

Construct a likelihood profile for the \(K\) parameter. Plot the profile and profile traces. Comment on your findings.


Model criticism

Estimating model parameters by fitting the model to data is typically only a step in the process of trying to understand the processes that generated the data. The next step involves trying to understand how and why the model fits the data the way it does, whether it fits it well, and what scope for improvement there might be. pomp provides a number of tools to facilitate answering these questions through interaction with a fitted model.

Simulation of the fitted model

Ultimately, since the model is viewed, at least hypothetically, as the process that generated the data, simulation of the fitted model is a central tool we have for model criticism. Let’s plot the data and several simulated realizations of the model process on the same axes.

The first lines above simply extract the maximum likelihood estimates (mle) from our profle computation. The next pair of lines plug these MLE parameters into a ‘pomp’ object (mlepomp) containing the model and the data. The last set of lines do the simulation and the plotting.

Although it is clear from these plots that the estimated model has more variability and is thus able to explain the data better, it can be hard to read much from spaghetti plots such as this. It’s almost always a good idea to plot the data together with several simulated realizations in order to help assess how similar the two are.

Model checking with probes

Visual comparison of simulations and data is always a good idea. An indication that the data are not a plausible realization of the model is evidence for lack of fit. In particular, if we have any set of summary statistics, or probes, we can apply them to both simulated and actual data. The probe function facilitates this comparison. Let’s perform this operation using several of the summary statistics provided with pomp: we’ll use the mean, several quantiles, and the autocorrelation at lags 1 and 3.

## <object of class 'probed_pomp'>

For ‘probed_pomp’ object, there are summary and plot methods. There is also an as.data.frame method.

## $coef
##            r            K        sigma          N_0            b       loglik 
##    3.6812237  212.7191314    0.6503141  150.4024136    1.0000000 -141.4531063 
##    loglik.se 
##    0.1213238 
## 
## $nsim
## [1] 200
## 
## $quantiles
##   mean   q.5%  q.25%  q.50%  q.75%  q.95% acf[1] acf[3] 
##  0.325  0.410  0.575  0.205  0.230  0.665  0.595  0.825 
## 
## $pvals
##      mean      q.5%     q.25%     q.50%     q.75%     q.95%    acf[1]    acf[3] 
## 0.6567164 0.8258706 0.8159204 0.4179104 0.4676617 0.6766169 0.8159204 0.3582090 
## 
## $synth.loglik
## [1] -18.82362

Evidently, summary returns a list with several elements. The quantiles element contains, for each probe, what fraction of the nsim simulations had probe values below the value of the probe applied to the data. The pvals element contains \(P\)-values associated with the two-sided test of the hypothesis that the data were generated by the model.

The plot depicts the multivariate distribution of the probes under the model, with the data-values superimposed. On the diagonal, we see the marginal distributions of the individual probes, represented as histograms, with the vertical line signifying the value of the corresponding probe on the data. Above the diagonal, the scatterplots show the pairwise distributions of probes and the crosshairs, the corresponding data-values. Below the diagonal, the panels contains the pairwise correlations among the simulated probes.

Next steps

To this point, we’ve seen how to implement POMP models, simulate them, to compute and maximize likelihoods, and perform certain kinds of diagnostic checks. The pomp website contains more documentation, including the full package manual, and a variety of tutorials and short courses. The package itself contains a number of built-in examples and datasets that can be explored.

pomp provides a large toolbox of different inference methods, only a few of which have been explored here. In particular, the package provides other methods for parameter estimation, both in the frequentist and Bayesian modes. See for example (abc, pmcmc, probe.match, spect.match, nlf, enkf, bsmc2). It also provides a variety of tools for model checking (spect, nlf). It is frequently the case that an approach that makes use of more than one approach has advantages over more “purist” approaches: the main goal of the package is to facilitate effective inference by bringing a variety of tools, with complementary strengths and weaknesses, to the user in a common format.

Although the goal of this document has been to introduce the beginner to the package through a display of the pomp toolbox in the context of a rudimentary and incomplete data analysis of a short time series with toy models, it is important to realize that these tools have proven their utility on some extremely challenging problems, including some for which other existing methods are either less efficient or entirely infeasible. The bibliography has links to peer-reviewed publications that have used these methods.


This document was produced using pomp version 4.3 and R version 4.2.1.


References

Arulampalam MS, Maskell S, Gordon N, Clapp T (2002). “A Tutorial on Particle Filters for Online Nonlinear, Non-Gaussian Bayesian Tracking.” IEEE Transactions on Signal Processing, 50, 174–188. https://doi.org/10.1109/78.978374.

Ionides EL, Bretó C, King AA (2006). “Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences of the U.S.A., 103(49), 18438–18443. https://doi.org/10.1073/pnas.0603181103.

Ionides EL, Nguyen D, Atchadé Y, Stoev S, King AA (2015). “Inference for Dynamic and Latent Variable Models via Iterated, Perturbed Bayes Maps.” Proceedings of the National Academy of Sciences of the U.S.A., 112(3), 719–724. https://doi.org/10.1073/pnas.1410597112.

McCleery RH, Perrins CM (1991). “Effects of Predation on the Numbers of Great Tits Parus Major.” In Bird population studies. Relevence to conservation and management 129–147. Oxford University Press, Oxford.

Ricker WE (1954). “Stock and Recruitment.” Journal of the Fisheries Research Board of Canada, 11, 559–623. https://doi.org/10.1139/f54-039.

Top of this document
pomp documentation index
pomp manual
pomp homepage